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browsing a media file wherein a user selects at least one 
feature in a media file and is provided with information 
regarding the existence of the selected feature in the media 
file. Based on the information, the user can identify and 
playback portions of interest in a media file. Features in a 
media file, such as a speaker's identity, applause, silence, 
motion, or video cuts, are preferably automatically time- 
wise evaluated in the media file using known methods. 
Metadata generated based on the time-wise feature evalua- 
tion are preferably mapped to confidence score values that 
represent a probability of a corresponding feature's exist- 
ence in the media file. Confidence score information is 
preferably presented graphically to a user as part of a 
graphical user interface, and is used to interactively browse 
the media file. 
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FIG. 13 is a schematic block diagram of a preferred media user's subjective evaluation of whether the speaker is 

browser in accordance with invention; present in the media file at each time-wise position in the 

FIG. 14 is a flow chart of steps of a preferred method for media file - The user can enter lhe metadata continuously 

selecting a media file to be browsed; after viewing each video frame or listening to each pre- 

FIG. 15 is a preferred media selection screen; 5 defined sub-portion of the media file Alternately the user 

_ - ... „ . , - „ ... could identify onoy those portions of the media file where 

FIG. 16 is a flow chart of steps of a preferred method for ^ ^ is fea , ured) and al , 0(her rtions of , he media 

mapping feature metadata to a correspond.ng confidence fi , e anM be assjgned metada , a va , ues , hat indica , e ^ ^ 

score ' speaker is not present in the media file. 

FIG. 17 is a flow chart of steps of a preferred method for 10 piG. 1 shows an example graphical user interface 10 in 

processing confidence scores and providing information accordancc ^ tne invcntion . ^ &aphic ^ user intcrfacc 

related to the confidence scores; 1Q ^ typically disp i aycd on a vidc o display, such as a 

FIG. 18 is a flow chart of steps of a preferred method for computer monitor, but can be presented to a user on any 

controlling playback functions based on a selected feature; ot h C r display device. The interface 10 has a video display 

anci 15 portion 1 in which a still video frame representation of an 

FIG. 19 is a flow chart of steps of a preferred method for image contained in a media file can be presented. Video 

reviewing a media file based on selection of a confidence frames can also be displayed at any desired video frame rate, 

score indication. such as that performed on a standard computer monitor or 

television set. Audio representations of sound information 

DETAILED DESCRIPTION OF PREFERRED 20 contained in a media file can be presented to a user by a 

EMBODIMENTS speaker (not shown). Alternately, visual representations of 

The invention is described in connection with browsing a sound information, such as a waveform, can be presented on 

media file. As used herein, browsing refers to any operation me S ra P nical user interface 10. 

related to reviewing, manipulating, editing, selecting por- A playback control bar 2 contains playback controls that 

tions of, storing, transmitting, or other operations related to 25 are used to control how a visual and/or aural representation 

a media file. In addition, a preferred embodiment of a of a media file is presented to a user. For example, the 

graphical user interface for a media browser is described playback control bar 2 can include buttons for controls such 

below having specific functions. In many cases, various as play, stop, rewind, fast forward, and forward and reverse 

functions of the media browser can be eliminated or altered, index. The playback controls shown in FIG. 1 are not 

as desired. For example, information that is displayed intended to be an exhaustive representation of all of the 

graphically in the embodiment described ' below can be different types of playback controls that can be provided. For 

provided aurally or otherwise, as described more fully example, controls can be provided to alter the effective 

below. frame rate at which a video display is generated or the speed 

As discussed above, several methods exist for automati- 35 at which a sound representation is played back. As with 

cally deriving information that represents the likelihood of other features of the interface 10, a user activates controls on 

the existence of a feature in a media data stream. Such the playback control bar 2 by clicking a desired button with 

information, called metadata, can be very useful and pro- a mouse 0ther methods for activating the controls 

vides cues for locating portions of interest in a media file. can be used » such as individual keystrokes or combinations 

For example, automatic methods exist for identifying silent 40 of keystrokes on a keyboard, selection of controls using a 

portions in audio data. Identifying such portions can be touch sck™ or light pen, or other selection device, 

useful in editing the audio data because, if silent portions are A timeline 3 includes a thumb 31 that indicates a current 

identified, an edit region of the audio data can be bounded time-wise position in the media file. The timeline 3 shown 

by the silent portions, thereby insuring that speech in the edit in FIG. 1 ranges from time 0:00 to time 47:38 and can be 

region is not cut mid-word or mid-sentence. 45 optionally zoomed in or out and panned using the timeline 

As another example, a common video analysis technique controls 32. 

is the detection of shot boundaries, which represent signifi- The thumb 31 can be moved time-wise along the timeline 

cant differences between consecutive video frames. As is 3 by using the left and right timeline controls 32. Alternately, 

well known, the difference between consecutive video a user could select the thumb 31 using a mouse and move the 

frames can be determined based on statistical properties of 50 thumb 31 to a desired position along the timeline 3 using the 

the pixels in the frame images, histogram differences, com- mouse. By moving the thumb 31 time-wise along the 

pression algorithms, edge differences, or motion detection. timeline 3, a corresponding video and/or aural representa- 

Application of one of several known techniques for detect- tion of the media file located at the time-wise position of the 

ing shot boundaries results in a series of time-varying thumb 31 can be played back. For example, video frames or 

metadata values that each represent a difference between 55 preselected keyframes at corresponding time-wise locations 

consecutive video frames. Once metadata regarding the shot in the media file can be displayed in the video display 

boundary feature is determined, information can be provided portion 1 as the thumb 31 is moved. Alternately, video/aural 

to a user to aid the user in locating portions of interest in the playback could begin from a corresponding position in the 

media file, e.g., places in the media file where scene changes media file after the thumb 31 is moved to a desired position 

occur. 60 and released. 

Although metadata can be automatically generated, meta- A user can select a feature in the media file using a media 

data can be input by a user or group of users. For example, feature selection 4. The feature selection 4 includes a scroll 

a user could review a media file and identify sections of the bar for navigating through a list of selectable features, 

media file where a particular speaker is featured, e.g., the Preferably, features in the feature selection 4 are selected by 

speaker is either shown visually or is speaking. After iden- 65 double clicking a desired feature with a mouse cursor, 

tifying portions of the media file where a speaker is featured, Alternately, a user could type in a selected feature using a 

the user can manually enter metadata that indicates the keyboard or otherwise enter a selected feature, such as by 
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MEDIA BROWSER USING MULTIMODAL 
ANALYSIS 

BACKGROUND OF THE INVENTION 

1. Field of Invention 

This invention relates to a method and apparatus for 
reviewing an aural and/or visual and/or other representation 
of a media file. Specifically, the invention relates to using 
media content features to allow a user to more easily review 
a media file. 

2. Description of Related Art 

Text documents often have many cues, such as headings, 
paragraphs, punctuation, etc., that allow a reader to quickly 
determine the beginning and end of different sections of the 
document and to aid the reader in finding areas of interest. 
However, video and audio browsing systems typically do 
not provide information to the user regarding simple 
features, like section beginning and end points, much less 
more complicated information, like the name of the speaker 
on a video clip. Such browsing systems typically offer only 
standard "VCR-type" playback control options, like play, 
stop, rewind, and fast forward. As anyone who has tried to 
find a specific video clip on a conventional video tape using 
a standard VCR will understand, it is often difficult to locate 
portions of interest in a video using the standard playback 
controls. 

Many techniques exist for extracting information that 
represents the feature content of a media file. In this 
application, the term media or media file is used to represent 
any data stream that contains information regarding video or 
other image information, audio information, text informa- 
tion and/or other information. A feature of a media file is a 
property of the video, audio and/or text information in the 
media file, such as video or audio format, or information 
relating to the content of the media file, such as the identity 
of a speaker depicted in a video sequence, occurrences of 
applause, video shot boundaries, or motion depicted in a 
video sequence. For example, Pfeiffer et al., "Automatic 
Audio Content Analysis" ACM MULTIMEDIA 96, Boston, 
MA, 1996, pp. 21-30; Wilcox, et al., "Segmentation of 
Speech Using Speaker Identification", Proc. ICASSP 94, 
vol. SI, Apr. 1994, pp. 161-164; and Foote, "Rapid Speaker 
ID Using Discrete MMI Feature Quantisation," Expert Sys- 
tems with Applications, vol. 13, no, 4, 1997, pp. 293-289, 
describe various methods for identifying audio features, 
such as music, human speech, and speaker identity. Regard- 
ing video data, Boreczky et aL, "Comparison of Video Shot 
Boundary Detection Techniques" Proc. SPIE Conf. On Stor- 
age and Retrieval for Still Image and Video Databases IV, 
San Jose, CA, vol. 2670, Feb. 1996, pages 170-179, and 
Zhang et al., "Automatic Partitioning of Full -motion Video", 
Multimedia Systems, vol. 1, no. 1, 1993, pp. 10-28, disclose 
methods for identifying shot boundaries (radical changes in 
video content) and motion. As described in these and other 
similar references, features in a media file can be identified 
automatically using any of a number of different techniques. 

SUMMARY OF THE INVENTION 

Providing feature information in a media browsing system 
can be very useful for a user when identifying areas of 
interest in a media file, controlling media playback, editing 
a media file, or performing other operations with a media 
file. For example, graphically identifying areas in a media 
file where a particular speaker is shown on a video clip can 
allow a user to quickly determine and playback those 
portions that contain the speaker. 
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Providing feature information to the user based on auto- 
matically identified features also eliminates the need for a 
user to manually index or otherwise mark significant por- 
tions of the media file for later retrieval. Thus, the invention 
5 can use existing methods for automatically identifying fea- 
tures in a media file to generate and provide feature infor- 
mation to a user to aid the user in browsing the media file. 

The invention provides a media browser that uses media 
feature information as an aid in navigating, selecting, 
10 editing, and/or annotating a media file. 

In one aspect of the invention, media features are selected 
by a user. 

In one aspect of the invention, media browsing functions, 
15 such as play, rewind, stop, fast-forward, index, automatic 
slide show, and automatic preview, are controlled based on 
feature information. 

In one aspect of the invention, feature information for a 
selected feature is mapped to a corresponding confidence 
20 score. 

In one aspect of the invention, the media browser includes 
a feature indicator that provides information related to a 
corresponding selected feature based on a corresponding 
confidence score. 
25 In one aspect of the invention, a feature indicator com- 
bines at least two confidence scores and provides informa- 
tion based on the combination. 

In one aspect of the invention, a feature indicator provides 
information related to a confidence score based on a value of 
30 another confidence score. 

The invention also provides a method for browsing a 
media file. A feature of the media file being browsed is 
selected and information related to a confidence score for at 
35 least one selected feature is provided. The confidence score 
relates to the existence of a corresponding selected feature in 
the media file. Based on the information related to the 
confidence score, a portion of the media file is selected. 

In one aspect of the invention, a metadata value repre- 
4 q senting a time-wise evaluation of a feature in the media file 
is mapped to a corresponding confidence score. 

In one aspect of the invention, mapping of a metadata 
value to a corresponding confidence score is non- linear. 

In one aspect of the invention, mapping of a metadata 
45 value to a confidence score is dependent on a user-defined 
control value or values. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention will be described in relation to the follow- 
50 ing drawings in which reference numerals refer to like 
elements, and wherein: 

FIG. 1 is an example screen display for a graphical user 
interface in accordance with the invention; 

FIG. 2 shows example metadata values, corresponding 
confidence score information, and an example graphical 
depiction of corresponding feature information; 

FIGS. 3-9 show different ways of graphically depicting 
information related to a feature in a media file; 
60 FIG. 10 is an example screen display depicting how 
features can be combined and corresponding combined 
feature information can be shown; 

FIG. 11 is an example screen display depicting how 
display of feature information relating to a corresponding 
65 feature can be conditioned on another feature; 

FIG. 12 is a flow chart explaining basic steps of a 
preferred method for browsing a media file; 
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speaking a selected feature, which is recognized by the feature in a media file, automatically derived metadata is 

browsing system using voice recognition techniques. preferably mapped to a corresponding confidence score. The 

Preferably, the features listed in the feature selection 4 are confidence score is thus intended to convey to a user, in an 

organized in alphabetical order or otherwise grouped logi- understandable form, the probability of a feature's existence 

cally. For example, video -related features are preferably 5 in the media file. Preferably, the metadata is mapped to a 

grouped apart from audio-related features. corresponding time-varying confidence score using some 

Selected features are displayed in selected feature win- function that is based on prior knowledge about the meta- 

dows 5. In the example graphical user interface 10 of FIG. data's reliability. However, mapping of the metadata to a 

1, three selected feature windows 5-1, 5-2, and 5-3 are confidence score can be performed in a one-to-one 

shown. However, fewer or more selected feature windows 5 10 relationship, where individual metadata data values are 

could be used. Corresponding to each selected feature win- equal to corresponding confidence score values, 

dow 5 is a feature indication window 6. The feature indi- FIG. 2 shows an example of shot detect metadata derived 

cation window 6 preferably graphically indicates the time- from a media file containing video information. The meta- 

wise presence or absence of a corresponding feature in the data values are represented graphically, and in this example, 

media file, but the feature indication window 6 could display 15 relatively larger metadata values represent relatively large 

other information related to the corresponding feature. For differences between consecutive video frames at a corre- 

example, the feature indication window 6-3 graphically sponding time-wise position in the media file, 

displays the time-wise presence of applause in the media i n the example shown in FIG. 2, metadata representing 

file. In this example, solid vertical lines displayed in a sno t boundary values are mapped to a confidence score 

feature indication window 6 indicate a determination that the 20 using a threshold. If a metadata value is above the threshold, 

corresponding feature is likely present in the media file at the th e corresponding confidence score is assigned a value of 

corresponding timewise location in the media file. However, one> indicating a relatively high probability that a video cut 

other representations are possible, as discussed more fully ^ present in the media file. If the metadata value is below the 

below. threshold, the corresponding confidence score is assigned a 

A feature summary bar 7 preferably graphically displays 25 value of zero. Preferably, the threshold is adjusted automati- 

a summary of the selected features. In this example, the cally or by a user based on the reliability of the metadata, 

feature summary bar 7 includes graphical icons 8 that each The confidence scores for the shot detection feature can be 

indicate, or represent, specific information. For example, the depicted in a feature indication window 6 of a graphical user 

scissors icon 8 indicates the presence of a cut or shot detect interface 10 as shown in FIG. 2, In this example, dark 

in the media file. The icons 8 can also indicate information 30 portions of the feature indication window 6 represent a 

related to two or more features. For example, a rectangular confidence score of zero and light portions of the feature 

icon 8 containing a circle indicates a portion of the media file indication window 6 represent confidence score values of 

that contains an occurrence of applause followed shortly one. 

after a shot detect occurrence. This is only one example of ^ as will be understood by those of ordinary skill in the art, 

the possible icons 8 that can be used to summarize selected the metadata values for a media feature can be mapped to a 

features. Additional summary representations are discussed confidence score in many different ways. As discussed 

below. above, the metadata values are preferably mapped to corre- 

Key playback buttons 9 are also provided for each of the sponding confidence score values using a function that is 

selected features and for the feature summary bar 7. The key 4Q determined based on the feature corresponding to the meta- 

playback buttons 9 allow a user to select a feature that is data and the reliability of metadata. Linear or non-linear 

used to control playback functions of the playback control mapping, including normalization, thresholding and 

bar 2. For example, if the key playback button 9 for the filtering, can be performed for numerical metadata. For 

selected feature "speaker MD" is selected, the play and/or non-numerical metadata, such as text, closed captions 

index buttons in the playback control bar 2 could be con- 45 (subtitles), or MIDI streams, metadata can be mapped to 

trolled such that whenever the button is selected, portions of confidence score values using statistical or other mathemati- 

the media file containing the speaker MD are "jumped" to cal methods. For example, the time-wise frequency of a 

and played back. Other portions of the media file are skipped word or set of words in a media data stream could be 

and not presented to the user. As another example, if the key converted to a set of confidence score values where a 

playback button 9 for the selected feature "applause" is $Q relatively high confidence score value is assigned to a 

selected, the index buttons in the playback control bar 2 portion of the media file that contains relatively more 

could be controlled such that whenever an index button is occurrences of the word or set of words, and lower confi- 

selected, a representation of the media file begins immedi- dence scores are assigned to portions containing fewer 

ately before or after occurrences of applause in the media occurrences of the word or set of words, 

fite- 55 For multi-dimensional data, such as metadata generated 

The interface 10 also includes a confidence score value by a face recognition algorithm, a lower dimensional con- 
manipulation control 11 that allows a user to adjust confi- fidence score can be computed. In a preferred embodiment, 
dence score values for corresponding features as desired. As a face recognition algorithm generates five numerical values 
explained more fully below, a confidence score relates to a for each face detected in a video frame: the coordinates of 
corresponding feature, is preferably determined using meta- 60 the upper left and lower right corners of the bounding 
data for a corresponding feature, and preferably is used to rectangle of the candidate face, and how well the candidate 
generate the information displayed in the feature indication matches the face template. A confidence score for the face is 
windows 6. preferably determined according to Equation 1: 

Because automatic techniques for generating metadata for ^ wc/^a «iP^ 

media features do not always work reliably, the metadata 65 

itself is not always easy to interpret, and the metadata rarely where C is the confidence score, MS is the value represent- 
positively indicate the absolute presence or absence of a ing how well the candidate matches the face template, CA is 
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the area of the candidate face determined from the upper left 
and lower right corner coordinates, and MFA is the maxi- 
mum face area (a constant). In this example, CA is prefer- 
ably not larger than or constrained to be not larger than 
MFA. 5 

Mapping metadata values to confidence score values can 
be performed according to functions that are "learned" using 
techniques such as learning Bayesian networks. For 
example, a confidence score mapping system could "learn" 
that metadata determined for a particular feature is more or 10 
less reliable than originally anticipated, and adjust the map- 
ping function, e.g., a mapping threshold, accordingly. 

Mapping of metadata values is also preferably controlled 
based on user control values. For example, a user can adjust 
a threshold value or other control values using the confi- 15 
dence score value manipulation control 11 in the graphical 
user interface 10. 

Using the example in FIG. 2, adjusting the sliders on the 
confidence score value manipulation control 11 can adjust 
the threshold value or values used to map the metadata to 20 
confidence scores. By adjusting the threshold value(s), the 
user can adjust the sensitivity of the mapping function to the 
metadata and customize the feature information displayed in 
the feature indicator windows 6. In the FIG. 2 example, only 
one threshold is used and so only one slider need be 25 
displayed on the confidence score value manipulation con- 
trol 11. However, more than one threshold can be used to 
map metadata to confidence scores and so an additional 
slider can be displayed. Further, other graphical controls, 
such as dials, up/down buttons, numerical indicators, etc., in 30 
addition to or in place of the sliders can be used. The 
confidence score value manipulation control 11 can also be 
used to adjust mapping control values other than thresholds, 
such as filter coefficient values, or any other value used to 
change the way metadata is mapped to confidence score 35 
values. 

After confidence score values are determined for a 
feature, the confidence score information is preferably pro- 
vided to a user as an aid in browsing a media file as 
discussed above. FIGS. 3-9 show additional examples of 40 
how confidence score information can be presented to a user. 
These examples are not intended to be exhaustive and other 
methods can be used to provide confidence score informa- 
tion. For example, confidence score information can be 
conveyed to stimulate human senses other than sight, includ- 45 
ing sound, touch (e.g., by activating a Braille touchpad) or 
smell. 

The examples shown in FIGS. 3-6 involve a selected 
feature "speaker MD". This feature represents a speaker 
depicted and/or recorded in a media file where MD repre- 50 
sents the speaker's initials. In this example, metadata related 
to the speaker's identity can be generated using face recog- 
nition techniques, in the case where the speaker is identified 
"visually" or using speech recognition techniques, where the 
speaker is identified as having a particular speech pattern, 55 
for example. The speaker's identity can also be verified 
manually by a user. For example, a user can review a media 
file and enter metadata directly into the system indicating 
whether a particular speaker is featured in the media file. The 
user can also enter metadata for other feature types, thereby 60 
eliminating a need to have the metadata determined auto- 
matically. 

FIG. 3 shows an example feature indication window 6 
containing icons 8. The icons 8 represent likely portions in 
a media file where the speaker MD either talks or is 65 
represented visually. In this example, portions where the 
speaker MD either talks or is represented visually are 



identified as those portions for which the confidence score 
exceeds a given threshold. The icons 8 optionally contain a 
representation of speaker MD's face or is a thumbnail 
keyframe extracted from the media file at a corresponding 
point in the media file. Preferably, when a user clicks on an 
icon 8, a representation of the media file at a point corre- 
sponding in time is presented to the user. 

In FIG. 4, portions of the media file that most likely 
contain the speaker MD are indicated by shaded portions in 
the feature indication window 6. Optionally, the shaded 
portions can include color. For example, speaker MD could 
be assigned the color green, and all green portions displayed 
in the feature indication window 6 indicate portions of the 
media file most likely containing the speaker MD. In this 
example, as in all cases, the display is generated based on the 
corresponding confidence score, which may not always 
accurately indicate if the corresponding feature is present in 
the media file. Instead, as discussed above, the metadata and 
corresponding confidence score represents a probability or 
"best guess" regarding whether the feature is present in the 
media file. This may be true even when a user enters the 
metadata into the system. For example, a user may not be 
certain that a particular person is speaking in a media file 
even though the user is very familiar with the person's voice. 

In FIG. 5, the confidence score corresponding to speaker 
MD, or a normalized version of the confidence score, is 
displayed in the feature indication window 6. In this type of 
display, the metadata corresponding to speaker MD can be 
directly mapped on a one-to-one basis to the confidence 
score. That is, the metadata values are the same as the 
corresponding confidence score values. As can be seen in 
FIG. 5, the confidence score values may not always indicate 
that the speaker MD is either positively included or not 
included in the media file. Rather, the confidence score 
values can vary indicating an uncertainty regarding whether 
the corresponding feature is present in the media file. This 
uncertainty may be due to noise in the media data stream, 
two or more speakers speaking simultaneously, or that a 
portion of the speaker's face is partially obscured. 

In FIG. 6, the confidence score values shown in FIG. 5 
have been thresholded to generate the display in the feature 
indication window 6. In this example, any confidence score 
values that fall below the maximum value are clipped to the 
lowest possible value. As discussed above, a user can vary 
the threshold values used to process the confidence score 
values to generate the feature indication using the confi- 
dence score value manipulation control 11, or otherwise 
vary how the feature indication is presented based on the 
confidence score values. 

FIGS. 7-9 relate to the display of confidence score 
information for a video cut feature in a media segment. In 
FIG. 7, the confidence score values are shown without 
manipulation in the feature indication window 6. In this 
case, increasing confidence score values indicate higher 
probabilities that a video cut has been detected. In FIG. 8, 
the video cut feature indication shown in the feature indi- 
cation window 6 is generated based on integrating the 
confidence score values over time. This type of indication 
can be useful for determining the total number of video cuts 
in a media segment or in determining how much the video 
images change within the media segment. Thus, the change 
in the integrated value is proportional to the fraction of total 
accumulated feature change in the media file. Optionally, the 
curve indicating the integrated value of the confidence score 
values shown in FIG. 8 can vary in color as the curve 
represents an increase in value. For example, the curve could 
start as a blue color for relatively low values and gradually 
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change to a red color for maximum integration values. That FIG. 13 is a schematic block diagram of a preferred 

is, the curve could start as a blue color and as the curve embodiment of a media browser 100 in accordance with the * 

increases in value, the curve gradually changes through invention. The media browser 100 preferably includes a 

various shades of purple of increasing red content. FIG. 9 general purpose computer 110 that has a controller 111. The 

shows an alternate representation of the integrated confi- s controller 111 can be implemented as a single special 

dence score value where the integrated value is represented purpose integrated circuit (e.g., ASIC) each having a main or 

only by a corresponding color change in the feature indica- central proce ssor section for overall, system-level control, 

lion window 6 without display of the integration curve itself. and sec tions dedicated to performing various dif- 

In this example, the change in color represents the total ferem ific rompulationS( functions and other processes 

accumulated feature change in the media file. 1Q under ^ of (he ssor sectioa ^ 

The graphical user interface 10 preferably includes an can aJso be . lemented as a si le micro . 

option for combining features and displaying confidence . , .f c , . °. . . 

score information bLd on the feature combination, as P rocessor ™ or a of separate dedicated or 

shown in FIG. 10. In this example, the features "speaker programmable integrated or other electronic circuits or 

MD" and "applause" are combined, and the corresponding devices > e *8" hardwired electronic or logic circuits such as 

feature indication window 6 indicates media portions where 15 discrete element circuits or programmable logic devices, 

the speaker MD is depicted during applause. This type of controller 111 and/or the general purpose computer 110 

combination can be useful to locate portions in a media file also preferably include other circuitry or components, such 

where a particular speaker begins and/or ends a presentation as memory devices, relays, mechanical linkages, communi- 

or has made a particularly interesting comment during a cations devices, etc., to effect desired control and/or input/ 

presentation. 20 output functions. 

FIG. 10 shows another combination of the features The general purpose computer 110 also includes a feature 

"silence" and "shot detect". The corresponding feature indi- selector 112, a position indicator 113, a feature indicator 114 

cation window 6 indicates portions in the media file where and a mapping module 115, which are all preferably imple- 

a shot detect and silence are simultaneously present. This mented as software modules that include sets of instructions 

combination can be useful to indicate the beginning or end 25 performed by the general purpose computer 110 and/or the 

of a desired portion in the media file. controller 111. However, these elements, like the controller 

As will be appreciated, other feature combinations are 111, can be implemented as an ASIC or an array of ASICs, 

possible. For example, a combination of speech detection separate dedicated or programmable integrated or other 

and motion detection could provide useful information electronic circuits or other devices, 

regarding gestures made by a speaker. In addition, more than 30 A media input device 120 preferably inputs a media file in 

two features can be combined to generate more sophisticated the form of a data stream to the general purpose computer 

feature indication information. Preferably, the graphical user 110. The media input device 120 can include one or more of 

interface 10 includes a "Combine" button that a user can any of the following devices: a semiconductor memory 

click to combine two or more features. However, a user can device, CD-ROM device, magnetic disk drive, other volatile 

define which features to combine by other methods. 35 or non-volatile memory devices, communications devices, 

As shown in FIG. 11, display of confidence score infor- such as a modem, facsimile transmitter or receiver, local- 

malion can be conditioned on other features. In this area or wide-area network devices, telecommunications 

example, confidence score information related to the feature devices, or other wired or wireless communications devices, 

"speaker MD" is displayed only when confidence score In short, the media input device 120 can be any device that 

information indicates a relatively high probability that the 40 transmits a data stream representing a media file to the 

speaker MD is shown in a video segment of the media file general purpose computer 110. 

along with or within two minutes of applause in the The data stream transmitted from the media input device 

soundtrack of the media file. As will be appreciated by those 120 to the general purpose computer 110 can take any of 

of ordinary skill in the art, other feature combinations or many different forms. For example, the data stream can be 

conditions, e.g., Boolean operators, can be used depending 45 analog or digital data that is either compressed or 

on the type of feature indication information needed for a uncompressed, and the data can be provided in parallel or 

specific task and the relevant features. serial form at any desired transmission rate using any 

FIG. 12 is a flowchart of a preferred method for browsing desired transmission protocol, if any. 

a media file. In step S10, a user inputs a command indicating The controller 111 preferably communicates with a dis- 

a desire to browse a media file. In step S20, the desired 50 play 130, such as a CRT or LCD display. However, the 

media file is retrieved. In step S30, the graphical user display 130 is not limited to these types of displays. In short, 

interface 10 is preferably displayed to the user, although the display 130 can include any device capable of presenting 

other interfaces can be used. In step S40, the user preferably a representation of a media file to a user, including an audio 

selects at least one feature of the media file. In step S50, speaker, visual display device, touch display (such as a 

confidence scores corresponding to the selected features are 55 Braille device), etc. 

retrieved and/or generated. Preferably, metadata values for The media browser 100 also preferably includes a user 

all of the selectable features are precomputed and stored interface 140, such as a keyboard and/or computer mouse, 

with the media file so that metadata values need not be that permits a user to communicate with the controller 111. 

generated during a browsing session. However, the metadata Other devices can be used with the user interface 140, 

values and corresponding confidence score values can be 60 including touch screen devices, light pens, voice activated 

generated during a browsing session. and/or voice recognition systems, etc. 

In step S60, feature indications related to the selected Where the invention is used over a more distributed 
features are displayed to the user. Instep S70, the user enters system, such as the Internet, the user interface 140 may 
commands to play back or otherwise manipulate the media include a general purpose computer and appropriate corn- 
file as desired. For example, a user can click on an icon 8 in 65 munications devices to send and receive signals to a more 
a feature indication window 7 and media playback begins remotely located general purpose computer 110, i.e., a 
from a corresponding position in the media file. remote server. In this case, the general purpose computer 
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110 can send display control signals to the user interface 
140, which directly controls the display 130. In addition, the 
user interface 140 can perform some or all of the functions 
performed by the general purpose computer 110, and some 
or all of the modules, such as the position indicator 113, 5 
mapping module 115, etc., can operate, at least in part, 
within the user interface 140. The media input 120 can be 
physically located near the general purpose computer 110, 
the user interface 140 (e.g., a CD-ROM drive), or at some 
other location. Alternately, the media input 120 can be 10 
formed of a multitude of media sources, such as magnetic 
hard drives distributed over a wide network. All that is 
required is that the media stored in the media input 120 be 
accessable to the media browser 100. In a preferred mode of 
operation, the media browser 100 receives a command from 15 
a user via the user interface 140 indicating a desire to browse 
a media file. The desired media file is then either transmitted 
from the media input device 120 to the general purpose 
computer 110 or the media file is retrieved from a memory 
device provided as part of the controller 111, for example. 20 
The controller 111 preferably sends a signal to the display 
130 to display the graphical user interface 10 to the user. 
Preferably, included in the signal provided to the display 130 
is a command to indicate a current position of the media file 
is the beginning of the media file. That is, the thumb 31 on 25 
the timeline 3 is displayed at the far left position. Of course, 
the thumb 31 need not be displayed, or can be displayed at 
any desired position along the timeline 3. Likewise, a video 
frame, if present, is displayed in the video display portion 1 
corresponding to a current position in the media file. The 30 
signal provided by the controller 111 to the display 130 
regarding the position of the thumb 31 is based on a signal 
provided by the position indicator 113, which keeps track of 
a current position in the media file. 

After viewing the graphical user interface 10, the user 35 
preferably selects at least one feature of the media file using 
the media feature selection 4 and the user interface 140. 
Feature selection indications are provided to the feature 
selector 112, which stores the selected features and provides 
a signal to the controller 111 to display the selected features 40 
in a selected feature window 5 of the interface 10. The 
feature selector 112 also provides an indication to the 
controller 111 if a selected feature is not selectable. For 
example, if a user selects a feature that is not present in a 
media file, e.g., a user selects the shot detect feature for a 45 
media file that does not contain video information, an 
indication that the selected feature is not selectable is 
provided to the user. Preferably, the feature selector 112 
provides a list of selectable features to the controller 111 so 
that only selectable features are displayed in the media 50 
feature selection 4. 

Once media features are selected, the mapping module 
115 generates and/or retrieves confidence scores corre- 
sponding to the selected features. Preferably, metadata val- 
ues for all of the selectable features are precomputed and 55 
stored with the media file so that metadata values need not 
be generated during a browsing session. However, the 
mapping module 115 can automatically generate metadata 
values and map the metadata values to corresponding con- 
fidence score values "on the fly." 60 

Next, the feature indicator 114 generates and provides 
information to the controller 111 to display feature indica- 
tions related to the selected features. 

Based on the displayed feature indications, a user can 
select a portion of the media file for review, or perform other 65 
actions, such as zooming the timeline 3, moving the timeline 
thumb 31, etc. When the user makes a decision to review a 



portion of the media file, a user can input signals to the 
media browser 100 to playback or otherwise manipulate the 
media file. For example, the user can click on an icon 8 in 
a feature indication window 7 and the controller 111 controls 
the display 130 to provide a representation of the media file 
starting from a time-wise position in the media file consis- 
tent with the selected icon 8. 

FIG. 14 is a flowchart that describes in further detail how 
a user chooses a media file for browsing and provides a 
corresponding signal to the media browser 100. In step 
S110, the controller 111 sends a signal to the display 130 to 
display a media selection screen. Preferably, the displayed 
selection screen includes at least the display shown in FIG. 
15. The selection screen preferably includes several rows 
200, each row 200 displaying information for a correspond- 
ing media file. Each row 200 preferably includes such 
information as a keyframe 201, a timeline 202, a media file 
name 203, identifying information, such as a date, 204, and 
a keyword listing 205. The keyframe 201 is preferably 
selected from a particularly relevant portion of the media file 
or otherwise visually conveys information related to the 
overall content of the media file. The timeline 202 is similar 
to the timeline 3 in the graphical user interface 10. The 
timeline 202 preferably displays information related to a 
selected media feature, e.g., shot boundaries. Preferably, a 
user can view keyframes or other video information when a 
cursor 206 is moved over the timeline 202. For example, 
when a user moves the cursor 206, a keyframe in the media 
file that is nearest in time to the cursor 206 position is 
displayed to the user. Keyframes are represented on the 
timeline 202 by a small icon that preferably changes appear- 
ance when the icon is nearest the cursor 206. Keyframes can 
be automatically selected based on confidence score infor- 
mation or can be previously identified by an operator. 
Alternately, the media browser 100 could display a pre- 
defined summary video sequence or another shortened rep- 
resentation that provides a user additional information to 
make a selection decision. The timeline 202 can also provide 
other information related to media features in any of the 
ways described above. For example, icons, color sequences 
or keyframes could be displayed on the timeline 202 to 
provide media feature information. The media feature infor- 
mation can be displayed based on metadata and confidence 
score information or based on predefined summary infor- 
mation provided by a user. 

Preferably, a user can search a database of media files by 
the name 203, identifying information 204 and/or keywords 
205 to narrow an initial search and generate an initial display 
of media file rows 200 on the selection screen. Preferably, 
the user can search using any known search screen, such as 
those associated with electronic database applications or 
Internet browser/search engines. In fact, the media browser 
100 can be very useful in browsing media files posted on a 
large network, such as the Internet, since the media browser 
100 allows a user to rapidly and accurately identify and 
experience desired portions in media file. To speed review 
and selection, the media displayed on the initial selection 
screen are preferably sorted, placing those media files most 
relevant to the search at the top. Sorting can be done based 
on confidence score information, search results (e.g., num- 
ber of hits in a media file), or other desired criteria. 

In step S120, the user reviews the media files, either by 
viewing keyframes, keywords, selected portions of the 
media files using the timeline 202 features, or by reviewing 
the media files in any of the ways discussed above. In step 
S130, the user selects one or more of the media files for 
browsing. Preferably, the user selects a media file by click- 
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ing on an associated keyframe 201, but the user can select 
media files in other ways, such as by entering a media file 
name or other identifier by keyboard. 

FIG. 16 is a flowchart that describes in further detail how 
the mapping module 115 generates and/or retrieves confi- 5 
dence scores. In step S510, the mapping module 115 gen- 
erates and/or retrieves metadata information for the selected 
features. As described above, the metadata information is 
preferably precomputed and stored along with the media 
file. However, if metadata for a selected feature is not stored 10 
with the media file, the mapping module 115 can apply any 
of the above described automatic techniques to the media 
file to generate metadata information. In addition, the map- 
ping module 115 can receive manually input metadata 
information from a user. 15 

In step S520, the mapping module 115 optionally pro- 
cesses the metadata information as desired in preparation for 
mapping the metadata to confidence scores. For example, 
the mapping module 115 thresholds, combines, filters or 
otherwise processes the metadata information in preparation 20 
for mapping the metadata information to confidence scores. 

In step S530, the mapping module maps the processed 
metadata information to confidence score values for each 
selected feature using a desired function, e.g., a one-to-one 
mapping, other linear or non-linear functions, etc. 25 

FIG. 17 is a flowchart describing in further detail how the 
feature indicator 114 provides feature information for the 
selected features. In step S610, the feature indicator 114 
processes the confidence scores for each selected feature as 



feature is no longer present (e.g., a speaker leaves the room) 
or where a feature next occurs (e.g., stop at the next shot 
boundary). As discussed above, the index controls can be 
controlled to jump to a point in the media file where a feature 
is present in or absent from the media file. Further, the 
timeline thumb 31 can be controlled based on a selected 
feature. For example, the thumb 31 could change its 
appearance, e.g., color, when the thumb 31 is moved over a 
media file portion where a selected feature is present, or the 
thumb 31 could be "indexed" so that a user can only move 
the thumb 31 within portions of the media file where a 
selected feature is present or absent. Preferably, control of 
the playback controls, including the thumb 31, is based on 
the confidence score value for a selected feature. Determi- 
nation of whether a feature is present or not in a media file 
is preferably based on the confidence score value, e.g., if the 
confidence score exceeds/is below a desired threshold the 
media browser 100 determines that the feature is present 
in/absent from the media file. 

FIG. 19 is a flowchart describing in further detail another 
way how the controller 111 optionally controls media play- 
back. In step S730, a user selects a confidence score indi- 
cation in a feature indication window 6. For example, a user 
can select (e.g., double click) an icon 8 displayed in a feature 
indication window 6. In step S740, the controller 111 
accesses the media file and controls the display 130 to begin 
playback at a point in the media file corresponding to the 
selected confidence score indication. Playback can be of 
selected portions of the media file of particular interest 



desired. For example, the feature indicator 114 thresholds, 30 determined based on confidence score values. 



filters, combines, etc., the confidence score values as dis- 
cussed above in preparation for providing feature informa- 
tion. 

In step S620, the feature indicator 114 generates infor- 
mation to display feature indications for each of the selected 35 
features based on the processed confidence scores. 

For example, the feature indicator 114 generates signals to 
display feature indications in a feature indication window 6. 

FIG718 is a flowchart describing hr further detail-how the^ 
controller-Ill -controls media playback as desired. In step 40 
S710, a user optionally selects features-using the key play- 
back buttons 9 jDn the graphical user interface. 10- y Fbr 
example, the user could select the "speaker MD" feature, jn 
/step S720^the controller 111 Jhputs 'a^laybaclTcontroFsignal 
7 received 'from "a ; ii'Ser arid' adjusts- the media playback- baseo" 45 
^-on the features selected in step S710, For example, the user 
could 'select the play button, and the media browser 100 
/adjusts the replay of the media file' based on the "speaker 
MD" feature. As one example, the media browser 100 could 
adjust lheLplayback„speed_ accord ing Jo i_Equaiiop"2r 



R«C+M(1-C) 



(2) 



where B. is the pjayback/ate,.^ isj^co/ifidence score for 



50 



7_the_ speaker MP feature, and M;isffim%timum playback rate< 
which" is preferably set' by a~ userr1n~this 'example, ~ r . the 55 
confidence score C varies between 0 and-l .-There fore,. as the 
confidence score increases (indicating that the speaker MD 
is speaking or is depictecF in the media" file)flhe pi ayback'rate 
^-/approaches the value C. In contrast, as the confidence, score 
/_ decreases (indicating that the speaker MD is not featured pi 60 
thVmedia file), the playback rate approaches the maximum 
p la y back rale jvi , mereby.speeding replay i^sections whefey 
Uw.¥r^aker„MDJs_not presenTin the media fileT^ ~~ 

This is only one example^ h~6w~the~media browser 100 
can control the media playback controls based on a selected 65 
feature. The media browser 100 can also control the stop 
button to stop playback at a point in the media file where a 



While the invention has been described with reference to 
specific embodiments, the description of the specific 
embodiments is illustrative only and is not to be construed 
as limiting the scope of the invention. Various other modi- 
fications and changes may occur to those skilled in the art 
without departing from the spirit and scope of the invention 
as set forth herein. 

What is claimed is: 

1. A media browser, comprising: 

a feature selection module that inputs at least one media 
feature selected by a user; 

a media presentation device that provides an aural and/or 
visual representation of a media file being browsed; and 

a feature indicator that provides information related to a 
confidence score for at least one selected media feature, 
the confidence score being related to a presence of a 
corresponding selected media feature in the media file; 

a confidence score mapping module that maps a metadata 
value representing a time-wise evaluation of a media 
feature in the media file to a corresponding confidence 
score, 

wherein the selected media feature is not used to perform, 
and is not related, to a text search of the media file. 

2. The media browser of claim 1, wherein the confidence 
score mapping module performs a non-linear mapping of the 
metadata value to the confidence score. 

3. The media browser of claim 1, wherein mapping of the 
metadata value to the confidence score is adjusted based on 
a user-defined control value. 

4. The media browser of claim 1, wherein the metadata 
value is automatically determined. 

5. The media browser of claim 1, wherein the metadata 
value is input by a user. 

6. The media browser of claim 1, wherein the feature 
indicator provides information related to the confidence 
score based on a value of another confidence score. 
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7. The media browser of claim 1, wherein the feature 10, The method of claim 9, wherein the mapping is a 
indicator combines at least two selected confidence scores non-linear mapping. 

and provides information related to the combined confidence 11. The method of claim 9, further comprising adjusting 

scores the mapping based on a user-defined control value. 

8. The media browser of claim 1, wherein the media 5 12. The method of claim 9, further comprising the step of 
presentation device comprises user controls that control automatically determining the metadata. 

presentation of the aural and/or visual representation of the 13 T** melhod of claim 9 ' farther comprising the step of 

media file, and an effect of the user controls on the presen- receiving metadata input by a user. 

tation is adjusted based on at least one confidence score. 14 melhod of claim 9 ' wherein the sle P of Priding 

9. A method of browsing a media file, comprising: 10 information further comprises providing information related 

- , j- ci i_ • i_ i , « to a confidence score based on a value of another confidence 

selecting a feature of a media file being browsed, the scQre 

feature not being used to perform and not related to a J5 ^ method Qf ^ % fuflher comprising lhe steps 

text search ot the media tile, of; ^^^g at least two selected confidence scores; and 

providing information related to a confidence score for at providing information related to the combined confidence 

least one selected feature relative to a segment of the scores. 

media file, the confidence score related to the existence 16 ^ method of claim 14 forther comprising the steps 

of a corresponding selected feature in the media file; of inputting a user control value for controlling a presenta- 

mapping metadata representing a time-wise evaluation of tion of an aural and/or visual representation of the media file; 
a feature in the media file to a corresponding confi- 2Q and 

dence score; and adjusting the effect of the user controls based on at least 

selecting a portion of the media file based on the infor- one confidence score, 
ma tion related to a confidence score for at least one 

selected feature. ***** 
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